Skip to content

Conversation

@keldonin
Copy link
Contributor

X-Mas present.

This pull request is bringing a whole lot of changes to the project, that were pending for a while. As these commits have grown on top of each others, they have been grouped together.

This pull request:

  • uses high resolution timers (typically in the range of ns) to improve precision of measurement, by the way reducing dependency to Boost
  • introduces the use of units for automatic conversion
  • considers an additional log-normal distribution
  • adds quantiles (p95,p98,p99)
  • adds Lilliefors fitness test against both normal and log-normal distributions
  • data points can now be recorded with new option (-d); data points can be extracted to a spreadsheet using the new utility datapoints2xlsx
  • adds support for RSA PSS testing
  • adds support for C_FindObjects() testing
  • graphs improvements: latency percentiles have now a scale distinct from average, coloured curves and axis, coloured horizontal lines
  • flow: when a key generation fails, the corresponding test cases are removed from the test cases, and the application no longer terminates
  • C_SeedRandom() test case uses random numbers for the seed
  • compilation warning under CLANG removed (using std::abs instead of abs)

…lution_clock + units

- boost::chrono and boost::timer dependencies removed
- using std::high_resolution_clock as well as std:steady_clock
- using std units for increased readability
- adding p95, p98 and p99 quantiles, using accumulator with cache size
of 1000
- fixed indentation
…ers - using log(1+x) for log normal distrib

- Addition of datapoints in results
- rationalized normality tests
- use of log(1+x) for small values when estimating log normal distrib
- datapoints2xlsx to extract datapoints
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces significant enhancements to the performance testing framework, focusing on improved timing precision, statistical analysis, and test coverage. The changes replace Boost Timer with C++ chrono for high-resolution timing, add log-normal distribution analysis with Lilliefors goodness-of-fit tests, and introduce new test cases for RSA-PSS signatures and C_FindObjects operations.

Key changes:

  • High-resolution timer implementation using std::chrono with custom duration types
  • Statistical enhancements including quantiles (p95, p98, p99) and log-normal distribution analysis
  • New test coverage for RSA-PSS and C_FindObjects() operations
  • Improved error handling that continues execution when key generation fails

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/units.hpp Defines custom duration types for nanoseconds, microseconds, and milliseconds with double precision
src/timeprecision.hpp/cpp Updates clock precision measurement to use custom duration types
src/p11benchmark.hpp/cpp Replaces Boost timer with chrono-based timing implementation
src/executor.hpp/cpp Adds Kolmogorov-Smirnov testing, quantile calculations, and datapoints output support
src/measure.hpp/cpp Enhances Measure class to support values without error measurements
src/p11rsapss.hpp/cpp Implements RSA-PSS signature benchmark test case
src/p11findobjects.hpp/cpp Implements C_FindObjects benchmark test case
src/p11seedrandom.cpp Updates to use random seed data instead of fixed values
src/keygenerator.hpp/cpp Changes key generation to return success/failure status instead of throwing exceptions
src/p11perftest.cpp Major refactoring to handle key generation failures gracefully and add new test cases
scripts/gengraphs.py Adds multiprocessing support, improved graph styling, and percentile visualization
scripts/datapoints2xlsx.py New utility to extract datapoints from JSON to Excel spreadsheets
configure.ac, Makefile.am Updates dependencies removing Boost Timer/Chrono requirements

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +125 to +127
label_data[label_len - 1] = '0' + (target_index % 10);
label_data[label_len - 2] = '0' + ((target_index / 10) % 10);
label_data[label_len - 3] = '0' + ((target_index / 100) % 10);
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code assumes the target_index is always less than 1000 (3 digits). If num_objects exceeds 999, this will produce incorrect label modifications. Consider adding a check or using a more robust string formatting approach like std::to_string with proper padding.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is in the critical path, hence the reason why std::to_string is not used. Also, the counter is modulo 512 (earlier in the same file). It means that target index will never be longer than 3 digits.

Copy link

@covertmatthew covertmatthew Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keldonin please correct me if I'm wrong...

The way I'm reading this is that we’re precomputing 512 random indices, but the value at each index is drawn from uniform_int_distribution(0, num_objects - 1). Magnitudes therefore follow num_objects (which can be ≥1000). The modulo wraps only the iteration counter into the 512 slots; it does not cap the stored values.

Examples:

  • num_objects = 2,000: a stored index could be 1234; on iteration N, target_index = 1234. We only rewrite the last 3 digits, so the template becomes …-234 and misses …-001234.
  • num_objects = 10,000: a stored index like 7821 rewrites to …-821, also missing.
  • Any num_objects ≥ 1001 allows 4–6 digit target_index; num_objects = 1000 is the last safe case (max index 999).

In short: modulo 512 only wraps the slot; the fetched target_index can still be 4–6 digits because that’s what was stored. Do we not need to rewrite all 6 digits to match the generated labels?

gengraphs.py: removed unnecessary format methods in string parsing,
issue found after Copilot review
size_t count = 0;
for(const auto& elapsed : elapsed_array) {
// Only consider successful measurements
if(std::holds_alternative<benchmark_result::Ok>(elapsed.second)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, your re-indent has made this worse...


auto vector_size = m_vectors.at(testcase).size();
auto stats_count = stats["count"]();

Copy link

@covertmatthew covertmatthew Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code calculates percentiles/Kolmogorov–Smirnov/TPS even when a testcase returns no valid samples. Suggestion to add a guard right after stats_count is computed to short-circuit on (a) any non-OK thread result or (b) zero samples: record errorcode and continue to the next testcase. This prevents divisions by zero/NaNs in the metrics and keeps the JSON/console output aligned with the actual failure instead of emitting bogus stats.

Suggested change
std::string thistestcase { benchmark.label() + '.' + testcase + '.' }; // move this here (from below)
// If no successful samples (or an error occurred), skip stats/quantiles/KS and report the error
if(!std::holds_alternative<benchmark_result::Ok>(last_errcode) || stats_count == 0) {
rv.add(thistestcase + "errorcode", errorcode(last_errcode));
continue;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants